3,699 research outputs found

    Gaussian approximation for the sup-norm of high-dimensional matrix-variate U-statistics and its applications

    Full text link
    This paper studies the Gaussian approximation of high-dimensional and non-degenerate U-statistics of order two under the supremum norm. We propose a two-step Gaussian approximation procedure that does not impose structural assumptions on the data distribution. Specifically, subject to mild moment conditions on the kernel, we establish the explicit rate of convergence that decays polynomially in sample size for a high-dimensional scaling limit, where the dimension can be much larger than the sample size. We also supplement a practical Gaussian wild bootstrap method to approximate the quantiles of the maxima of centered U-statistics and prove its asymptotic validity. The wild bootstrap is demonstrated on statistical applications for high-dimensional non-Gaussian data including: (i) principled and data-dependent tuning parameter selection for regularized estimation of the covariance matrix and its related functionals; (ii) simultaneous inference for the covariance and rank correlation matrices. In particular, for the thresholded covariance matrix estimator with the bootstrap selected tuning parameter, we show that the Gaussian-like convergence rates can be achieved for heavy-tailed data, which are less conservative than those obtained by the Bonferroni technique that ignores the dependency in the underlying data distribution. In addition, we also show that even for subgaussian distributions, error bounds of the bootstrapped thresholded covariance matrix estimator can be much tighter than those of the minimax estimator with a universal threshold

    A Note on Moment Inequality for Quadratic Forms

    Full text link
    Moment inequality for quadratic forms of random vectors is of particular interest in covariance matrix testing and estimation problems. In this paper, we prove a Rosenthal-type inequality, which exhibits new features and certain improvement beyond the unstructured Rosenthal inequality of quadratic forms when dimension of the vectors increases without bound. Applications to test the block diagonal structures and detect the sparsity in the high-dimensional covariance matrix are presented.Comment: 12 pages, 0 figur

    A robust bootstrap change point test for high-dimensional location parameter

    Full text link
    We consider the problem of change point detection for high-dimensional distributions in a location family when the dimension can be much larger than the sample size. In change point analysis, the widely used cumulative sum (CUSUM) statistics are sensitive to outliers and heavy-tailed distributions. In this paper, we propose a robust, tuning-free (i.e., fully data-dependent), and easy-to-implement change point test that enjoys strong theoretical guarantees. To achieve the robust purpose in a nonparametric setting, we formulate the change point detection in the multivariate UU-statistics framework with anti-symmetric and nonlinear kernels. Specifically, the within-sample noise is canceled out by anti-symmetry of the kernel, while the signal distortion under certain nonlinear kernels can be controlled such that the between-sample change point signal is magnitude preserving. A (half) jackknife multiplier bootstrap (JMB) tailored to the change point detection setting is proposed to calibrate the distribution of our β„“βˆž\ell^{\infty}-norm aggregated test statistic. Subject to mild moment conditions on kernels, we derive the uniform rates of convergence for the JMB to approximate the sampling distribution of the test statistic, and analyze its size and power properties. Extensions to multiple change point testing and estimation are discussed with illustration from numerical studies

    Inference in Kingman's Coalescent with Particle Markov Chain Monte Carlo Method

    Full text link
    We propose a new algorithm to do posterior sampling of Kingman's coalescent, based upon the Particle Markov Chain Monte Carlo methodology. Specifically, the algorithm is an instantiation of the Particle Gibbs Sampling method, which alternately samples coalescent times conditioned on coalescent tree structures, and tree structures conditioned on coalescent times via the conditional Sequential Monte Carlo procedure. We implement our algorithm as a C++ package, and demonstrate its utility via a parameter estimation task in population genetics on both single- and multiple-locus data. The experiment results show that the proposed algorithm performs comparable to or better than several well-developed methods

    Randomized incomplete UU-statistics in high dimensions

    Full text link
    This paper studies inference for the mean vector of a high-dimensional UU-statistic. In the era of Big Data, the dimension dd of the UU-statistic and the sample size nn of the observations tend to be both large, and the computation of the UU-statistic is prohibitively demanding. Data-dependent inferential procedures such as the empirical bootstrap for UU-statistics is even more computationally expensive. To overcome such computational bottleneck, incomplete UU-statistics obtained by sampling fewer terms of the UU-statistic are attractive alternatives. In this paper, we introduce randomized incomplete UU-statistics with sparse weights whose computational cost can be made independent of the order of the UU-statistic. We derive non-asymptotic Gaussian approximation error bounds for the randomized incomplete UU-statistics in high dimensions, namely in cases where the dimension dd is possibly much larger than the sample size nn, for both non-degenerate and degenerate kernels. In addition, we propose generic bootstrap methods for the incomplete UU-statistics that are computationally much less-demanding than existing bootstrap methods, and establish finite sample validity of the proposed bootstrap methods. Our methods are illustrated on the application to nonparametric testing for the pairwise independence of a high-dimensional random vector under weaker assumptions than those appearing in the literature

    Jackknife multiplier bootstrap: finite sample approximations to the UU-process supremum with applications

    Full text link
    This paper is concerned with finite sample approximations to the supremum of a non-degenerate UU-process of a general order indexed by a function class. We are primarily interested in situations where the function class as well as the underlying distribution change with the sample size, and the UU-process itself is not weakly convergent as a process. Such situations arise in a variety of modern statistical problems. We first consider Gaussian approximations, namely, approximate the UU-process supremum by the supremum of a Gaussian process, and derive coupling and Kolmogorov distance bounds. Such Gaussian approximations are, however, not often directly applicable in statistical problems since the covariance function of the approximating Gaussian process is unknown. This motivates us to study bootstrap-type approximations to the UU-process supremum. We propose a novel jackknife multiplier bootstrap (JMB) tailored to the UU-process, and derive coupling and Kolmogorov distance bounds for the proposed JMB method. All these results are non-asymptotic, and established under fairly general conditions on function classes and underlying distributions. Key technical tools in the proofs are new local maximal inequalities for UU-processes, which may be useful in other problems. We also discuss applications of the general approximation results to testing for qualitative features of nonparametric functions based on generalized local UU-processes

    Finite sample change point inference and identification for high-dimensional mean vectors

    Full text link
    Cumulative sum (CUSUM) statistics are widely used in the change point inference and identification. For the problem of testing for existence of a change point in an independent sample generated from the mean-shift model, we introduce a Gaussian multiplier bootstrap to calibrate critical values of the CUSUM test statistics in high dimensions. The proposed bootstrap CUSUM test is fully data-dependent and it has strong theoretical guarantees under arbitrary dependence structures and mild moment conditions. Specifically, we show that with a boundary removal parameter the bootstrap CUSUM test enjoys the uniform validity in size under the null and it achieves the minimax separation rate under the sparse alternatives when the dimension pp can be larger than the sample size nn. Once a change point is detected, we estimate the change point location by maximizing the β„“βˆž\ell^{\infty}-norm of the generalized CUSUM statistics at two different weighting scales corresponding to covariance stationary and non-stationary CUSUM statistics. For both estimators, we derive their rates of convergence and show that dimension impacts the rates only through logarithmic factors, which implies that consistency of the CUSUM estimators is possible when pp is much larger than nn. In the presence of multiple change points, we propose a principled bootstrap-assisted binary segmentation (BABS) algorithm to dynamically adjust the change point detection rule and recursively estimate their locations. We derive its rate of convergence under suitable signal separation and strength conditions. The results derived in this paper are non-asymptotic and we provide extensive simulation studies to assess the finite sample performance. The empirical evidence shows an encouraging agreement with our theoretical results

    Inference of high-dimensional linear models with time-varying coefficients

    Full text link
    We propose a pointwise inference algorithm for high-dimensional linear models with time-varying coefficients. The method is based on a novel combination of the nonparametric kernel smoothing technique and a Lasso bias-corrected ridge regression estimator. Due to the non-stationarity feature of the model, dynamic bias-variance decomposition of the estimator is obtained. With a bias-correction procedure, the local null distribution of the estimator of the time-varying coefficient vector is characterized for iid Gaussian and heavy-tailed errors. The limiting null distribution is also established for Gaussian process errors, and we show that the asymptotic properties differ between short-range and long-range dependent errors. Here, p-values are adjusted by a Bonferroni-type correction procedure to control the familywise error rate (FWER) in the asymptotic sense at each time point. The finite sample size performance of the proposed inference algorithm is illustrated with synthetic data and an application to learn brain connectivity by using the resting-state fMRI data for Parkinson's disease

    Hanson-Wright inequality in Hilbert spaces with application to KK-means clustering for non-Euclidean data

    Full text link
    We derive a dimension-free Hanson-Wright inequality for quadratic forms of independent sub-gaussian random variables in a separable Hilbert space. Our inequality is an infinite-dimensional generalization of the classical Hanson-Wright inequality for finite-dimensional Euclidean random vectors. We illustrate an application to the generalized KK-means clustering problem for non-Euclidean data. Specifically, we establish the exponential rate of convergence for a semidefinite relaxation of the generalized KK-means, which together with a simple rounding algorithm imply the exact recovery of the true clustering structure

    Distributed Consensus Resilient to Both Crash Failures and Strategic Manipulations

    Full text link
    In this paper, we study distributed consensus in synchronous systems subject to both unexpected crash failures and strategic manipulations by rational agents in the system. We adapt the concept of collusion-resistant Nash equilibrium to model protocols that are resilient to both crash failures and strategic manipulations of a group of colluding agents. For a system with nn distributed agents, we design a deterministic protocol that tolerates 2 colluding agents and a randomized protocol that tolerates nβˆ’1n - 1 colluding agents, and both tolerate any number of failures. We also show that if colluders are allowed an extra communication round after each synchronous round, there is no protocol that can tolerate even 2 colluding agents and 1 crash failure
    • …
    corecore